Dataset statistics
| Number of variables | 20 |
|---|---|
| Number of observations | 39759 |
| Missing cells | 16212 |
| Missing cells (%) | 2.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 6.1 MiB |
| Average record size in memory | 160.0 B |
Variable types
| NUM | 16 |
|---|---|
| BOOL | 2 |
| CAT | 2 |
Reproduction
| Analysis started | 2020-06-06 12:26:13.678057 |
|---|---|
| Analysis finished | 2020-06-06 12:27:08.357995 |
| Duration | 54.68 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
DATE has a high cardinality: 9942 distinct values | High cardinality |
X_3 is highly correlated with X_2 | High correlation |
X_2 is highly correlated with X_3 | High correlation |
MULTIPLE_OFFENSE has 15903 (40.0%) missing values | Missing |
X_10 is highly skewed (γ1 = 30.92348051) | Skewed |
X_12 is highly skewed (γ1 = 26.54109191) | Skewed |
INCIDENT_ID has unique values | Unique |
X_1 has 31814 (80.0%) zeros | Zeros |
X_4 has 5588 (14.1%) zeros | Zeros |
X_5 has 7908 (19.9%) zeros | Zeros |
X_7 has 5794 (14.6%) zeros | Zeros |
X_8 has 14634 (36.8%) zeros | Zeros |
X_11 has 4268 (10.7%) zeros | Zeros |
X_12 has 8517 (21.4%) zeros | Zeros |
X_14 has 458 (1.2%) zeros | Zeros |
X_15 has 1680 (4.2%) zeros | Zeros |
df_index
Real number (ℝ≥0)
| Distinct count | 23856 |
|---|---|
| Unique (%) | 60.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10336.960009054554 |
|---|---|
| Minimum | 0 |
| Maximum | 23855 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 993.9 |
| Q1 | 4969.5 |
| median | 9939 |
| Q3 | 14909 |
| 95-th percentile | 21867.1 |
| Maximum | 23855 |
| Range | 23855 |
| Interquartile range (IQR) | 9939.5 |
Descriptive statistics
| Standard deviation | 6378.244484 |
|---|---|
| Coefficient of variation (CV) | 0.617032907 |
| Kurtosis | -0.893746517 |
| Mean | 10336.96001 |
| Median Absolute Deviation (MAD) | 4970 |
| Skewness | 0.2791311245 |
| Sum | 410987193 |
| Variance | 40682002.69 |
| Value | Count | Frequency (%) | |
| 2047 | 2 | < 0.1% | |
| 5646 | 2 | < 0.1% | |
| 9768 | 2 | < 0.1% | |
| 11817 | 2 | < 0.1% | |
| 13866 | 2 | < 0.1% | |
| 1580 | 2 | < 0.1% | |
| 3629 | 2 | < 0.1% | |
| 5678 | 2 | < 0.1% | |
| 7727 | 2 | < 0.1% | |
| 9800 | 2 | < 0.1% | |
| Other values (23846) | 39739 | 99.9% |
| Value | Count | Frequency (%) | |
| 0 | 2 | < 0.1% | |
| 1 | 2 | < 0.1% | |
| 2 | 2 | < 0.1% | |
| 3 | 2 | < 0.1% | |
| 4 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 23855 | 1 | < 0.1% | |
| 23854 | 1 | < 0.1% | |
| 23853 | 1 | < 0.1% | |
| 23852 | 1 | < 0.1% | |
| 23851 | 1 | < 0.1% |
| Distinct count | 39759 |
|---|---|
| Unique (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 310.6 KiB |
| CR_69942 | 1 |
|---|---|
| CR_188343 | 1 |
| CR_84718 | 1 |
| CR_89395 | 1 |
| CR_24753 | 1 |
| Other values (39754) |
| Value | Count | Frequency (%) | |
| CR_69942 | 1 | < 0.1% | |
| CR_188343 | 1 | < 0.1% | |
| CR_84718 | 1 | < 0.1% | |
| CR_89395 | 1 | < 0.1% | |
| CR_24753 | 1 | < 0.1% | |
| CR_108827 | 1 | < 0.1% | |
| CR_30379 | 1 | < 0.1% | |
| CR_80628 | 1 | < 0.1% | |
| CR_93834 | 1 | < 0.1% | |
| CR_184355 | 1 | < 0.1% | |
| Other values (39749) | 39749 | > 99.9% |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 8.444201313 |
| Min length | 4 |
| Distinct count | 9942 |
|---|---|
| Unique (%) | 25.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 310.6 KiB |
| 13-SEP-01 | 36 |
|---|---|
| 12-SEP-01 | 34 |
| 15-SEP-01 | 27 |
| 17-SEP-01 | 25 |
| 11-SEP-01 | 24 |
| Other values (9937) |
| Value | Count | Frequency (%) | |
| 13-SEP-01 | 36 | 0.1% | |
| 12-SEP-01 | 34 | 0.1% | |
| 15-SEP-01 | 27 | 0.1% | |
| 17-SEP-01 | 25 | 0.1% | |
| 11-SEP-01 | 24 | 0.1% | |
| 14-SEP-01 | 22 | 0.1% | |
| 18-SEP-01 | 19 | < 0.1% | |
| 20-SEP-01 | 17 | < 0.1% | |
| 01-MAY-92 | 17 | < 0.1% | |
| 19-SEP-01 | 16 | < 0.1% | |
| Other values (9932) | 39522 | 99.4% |
Length
| Max length | 9 |
|---|---|
| Median length | 9 |
| Mean length | 9 |
| Min length | 9 |
| Distinct count | 8 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.47750194924419626 |
|---|---|
| Minimum | 0 |
| Maximum | 7 |
| Zeros | 31814 |
| Zeros (%) | 80.0% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 3 |
| Maximum | 7 |
| Range | 7 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.428754965 |
|---|---|
| Coefficient of variation (CV) | 2.992144779 |
| Kurtosis | 13.88980442 |
| Mean | 0.4775019492 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.815518275 |
| Sum | 18985 |
| Variance | 2.041340749 |
| Value | Count | Frequency (%) | |
| 0 | 31814 | 80.0% | |
| 1 | 5761 | 14.5% | |
| 7 | 1426 | 3.6% | |
| 5 | 458 | 1.2% | |
| 3 | 228 | 0.6% | |
| 4 | 48 | 0.1% | |
| 2 | 17 | < 0.1% | |
| 6 | 7 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 31814 | 80.0% | |
| 1 | 5761 | 14.5% | |
| 2 | 17 | < 0.1% | |
| 3 | 228 | 0.6% | |
| 4 | 48 | 0.1% |
| Value | Count | Frequency (%) | |
| 7 | 1426 | 3.6% | |
| 6 | 7 | < 0.1% | |
| 5 | 458 | 1.2% | |
| 4 | 48 | 0.1% | |
| 3 | 228 | 0.6% |
| Distinct count | 52 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24.763776754948566 |
|---|---|
| Minimum | 0 |
| Maximum | 52 |
| Zeros | 40 |
| Zeros (%) | 0.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 7 |
| median | 24 |
| Q3 | 36 |
| 95-th percentile | 48 |
| Maximum | 52 |
| Range | 52 |
| Interquartile range (IQR) | 29 |
Descriptive statistics
| Standard deviation | 15.23552157 |
|---|---|
| Coefficient of variation (CV) | 0.6152341673 |
| Kurtosis | -1.307501292 |
| Mean | 24.76377675 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | -0.09338549112 |
| Sum | 984583 |
| Variance | 232.1211175 |
| Value | Count | Frequency (%) | |
| 4 | 6724 | 16.9% | |
| 36 | 3657 | 9.2% | |
| 33 | 3573 | 9.0% | |
| 24 | 2257 | 5.7% | |
| 21 | 2088 | 5.3% | |
| 37 | 1606 | 4.0% | |
| 45 | 1545 | 3.9% | |
| 49 | 1486 | 3.7% | |
| 3 | 1307 | 3.3% | |
| 22 | 1091 | 2.7% | |
| Other values (42) | 14425 | 36.3% |
| Value | Count | Frequency (%) | |
| 0 | 40 | 0.1% | |
| 1 | 33 | 0.1% | |
| 2 | 194 | 0.5% | |
| 3 | 1307 | 3.3% | |
| 4 | 6724 | 16.9% |
| Value | Count | Frequency (%) | |
| 52 | 25 | 0.1% | |
| 51 | 162 | 0.4% | |
| 50 | 279 | 0.7% | |
| 49 | 1486 | 3.7% | |
| 48 | 98 | 0.2% |
| Distinct count | 52 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 24.61249025377902 |
|---|---|
| Minimum | 0 |
| Maximum | 52 |
| Zeros | 33 |
| Zeros (%) | 0.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 4 |
| Q1 | 8 |
| median | 24 |
| Q3 | 35 |
| 95-th percentile | 48 |
| Maximum | 52 |
| Range | 52 |
| Interquartile range (IQR) | 27 |
Descriptive statistics
| Standard deviation | 15.13187718 |
|---|---|
| Coefficient of variation (CV) | 0.6148048011 |
| Kurtosis | -1.23901782 |
| Mean | 24.61249025 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | -0.08088102605 |
| Sum | 978568 |
| Variance | 228.9737069 |
| Value | Count | Frequency (%) | |
| 4 | 6724 | 16.9% | |
| 34 | 3657 | 9.2% | |
| 32 | 3573 | 9.0% | |
| 24 | 2257 | 5.7% | |
| 23 | 2088 | 5.3% | |
| 37 | 1606 | 4.0% | |
| 45 | 1545 | 3.9% | |
| 49 | 1486 | 3.7% | |
| 2 | 1307 | 3.3% | |
| 22 | 1091 | 2.7% | |
| Other values (42) | 14425 | 36.3% |
| Value | Count | Frequency (%) | |
| 0 | 33 | 0.1% | |
| 1 | 40 | 0.1% | |
| 2 | 1307 | 3.3% | |
| 3 | 194 | 0.5% | |
| 4 | 6724 | 16.9% |
| Value | Count | Frequency (%) | |
| 52 | 25 | 0.1% | |
| 51 | 279 | 0.7% | |
| 50 | 162 | 0.4% | |
| 49 | 1486 | 3.7% | |
| 48 | 1066 | 2.7% |
| Distinct count | 10 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.279735405820066 |
|---|---|
| Minimum | 0 |
| Maximum | 10 |
| Zeros | 5588 |
| Zeros (%) | 14.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 10 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 2.956637769 |
|---|---|
| Coefficient of variation (CV) | 0.6908459259 |
| Kurtosis | -1.018315231 |
| Mean | 4.279735406 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1871304045 |
| Sum | 170158 |
| Variance | 8.741706897 |
| Value | Count | Frequency (%) | |
| 6 | 9078 | 22.8% | |
| 2 | 7883 | 19.8% | |
| 0 | 5588 | 14.1% | |
| 7 | 4781 | 12.0% | |
| 4 | 3369 | 8.5% | |
| 3 | 3160 | 7.9% | |
| 9 | 2320 | 5.8% | |
| 10 | 2113 | 5.3% | |
| 1 | 1461 | 3.7% | |
| 5 | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 5588 | 14.1% | |
| 1 | 1461 | 3.7% | |
| 2 | 7883 | 19.8% | |
| 3 | 3160 | 7.9% | |
| 4 | 3369 | 8.5% |
| Value | Count | Frequency (%) | |
| 10 | 2113 | 5.3% | |
| 9 | 2320 | 5.8% | |
| 7 | 4781 | 12.0% | |
| 6 | 9078 | 22.8% | |
| 5 | 6 | < 0.1% |
| Distinct count | 5 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.4527528358359114 |
|---|---|
| Minimum | 0 |
| Maximum | 5 |
| Zeros | 7908 |
| Zeros (%) | 19.9% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 3 |
| Q3 | 5 |
| 95-th percentile | 5 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 4 |
Descriptive statistics
| Standard deviation | 1.96318419 |
|---|---|
| Coefficient of variation (CV) | 0.8004003343 |
| Kurtosis | -1.556820375 |
| Mean | 2.452752836 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.1743576897 |
| Sum | 97519 |
| Variance | 3.854092163 |
| Value | Count | Frequency (%) | |
| 5 | 12238 | 30.8% | |
| 1 | 11252 | 28.3% | |
| 3 | 8355 | 21.0% | |
| 0 | 7908 | 19.9% | |
| 2 | 6 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 7908 | 19.9% | |
| 1 | 11252 | 28.3% | |
| 2 | 6 | < 0.1% | |
| 3 | 8355 | 21.0% | |
| 5 | 12238 | 30.8% |
| Value | Count | Frequency (%) | |
| 5 | 12238 | 30.8% | |
| 3 | 8355 | 21.0% | |
| 2 | 6 | < 0.1% | |
| 1 | 11252 | 28.3% | |
| 0 | 7908 | 19.9% |
X_6
Real number (ℝ≥0)
| Distinct count | 19 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.126461933147212 |
|---|---|
| Minimum | 1 |
| Maximum | 19 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 5 |
| Q3 | 8 |
| 95-th percentile | 15 |
| Maximum | 19 |
| Range | 18 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 4.463585046 |
|---|---|
| Coefficient of variation (CV) | 0.7285746806 |
| Kurtosis | 0.06079304921 |
| Mean | 6.126461933 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.9704193921 |
| Sum | 243582 |
| Variance | 19.92359146 |
| Value | Count | Frequency (%) | |
| 1 | 5794 | 14.6% | |
| 5 | 4476 | 11.3% | |
| 6 | 4390 | 11.0% | |
| 4 | 3869 | 9.7% | |
| 2 | 3863 | 9.7% | |
| 15 | 3822 | 9.6% | |
| 7 | 3728 | 9.4% | |
| 3 | 2909 | 7.3% | |
| 8 | 2356 | 5.9% | |
| 9 | 2098 | 5.3% | |
| Other values (9) | 2454 | 6.2% |
| Value | Count | Frequency (%) | |
| 1 | 5794 | 14.6% | |
| 2 | 3863 | 9.7% | |
| 3 | 2909 | 7.3% | |
| 4 | 3869 | 9.7% | |
| 5 | 4476 | 11.3% |
| Value | Count | Frequency (%) | |
| 19 | 5 | < 0.1% | |
| 18 | 264 | 0.7% | |
| 17 | 183 | 0.5% | |
| 16 | 1026 | 2.6% | |
| 15 | 3822 | 9.6% |
| Distinct count | 19 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.870947458437083 |
|---|---|
| Minimum | 0 |
| Maximum | 18 |
| Zeros | 5794 |
| Zeros (%) | 14.6% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 4 |
| Q3 | 7 |
| 95-th percentile | 12 |
| Maximum | 18 |
| Range | 18 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.870959307 |
|---|---|
| Coefficient of variation (CV) | 0.7947035644 |
| Kurtosis | 0.5203116861 |
| Mean | 4.870947458 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.7988064995 |
| Sum | 193664 |
| Variance | 14.98432596 |
| Value | Count | Frequency (%) | |
| 0 | 5794 | 14.6% | |
| 6 | 4476 | 11.3% | |
| 4 | 4390 | 11.0% | |
| 2 | 3869 | 9.7% | |
| 7 | 3863 | 9.7% | |
| 10 | 3822 | 9.6% | |
| 1 | 3728 | 9.4% | |
| 5 | 2909 | 7.3% | |
| 3 | 2356 | 5.9% | |
| 8 | 2098 | 5.3% | |
| Other values (9) | 2454 | 6.2% |
| Value | Count | Frequency (%) | |
| 0 | 5794 | 14.6% | |
| 1 | 3728 | 9.4% | |
| 2 | 3869 | 9.7% | |
| 3 | 2356 | 5.9% | |
| 4 | 4390 | 11.0% |
| Value | Count | Frequency (%) | |
| 18 | 240 | 0.6% | |
| 17 | 327 | 0.8% | |
| 16 | 339 | 0.9% | |
| 15 | 39 | 0.1% | |
| 14 | 31 | 0.1% |
| Distinct count | 27 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9781684650016349 |
|---|---|
| Minimum | 0 |
| Maximum | 99 |
| Zeros | 14634 |
| Zeros (%) | 36.8% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 3 |
| Maximum | 99 |
| Range | 99 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.46042113 |
|---|---|
| Coefficient of variation (CV) | 1.49301596 |
| Kurtosis | 652.7401544 |
| Mean | 0.978168465 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 14.4255057 |
| Sum | 38891 |
| Variance | 2.132829876 |
| Value | Count | Frequency (%) | |
| 1 | 18329 | 46.1% | |
| 0 | 14634 | 36.8% | |
| 2 | 3772 | 9.5% | |
| 3 | 1592 | 4.0% | |
| 4 | 673 | 1.7% | |
| 5 | 350 | 0.9% | |
| 6 | 152 | 0.4% | |
| 7 | 61 | 0.2% | |
| 8 | 54 | 0.1% | |
| 10 | 41 | 0.1% | |
| Other values (17) | 101 | 0.3% |
| Value | Count | Frequency (%) | |
| 0 | 14634 | 36.8% | |
| 1 | 18329 | 46.1% | |
| 2 | 3772 | 9.5% | |
| 3 | 1592 | 4.0% | |
| 4 | 673 | 1.7% |
| Value | Count | Frequency (%) | |
| 99 | 1 | < 0.1% | |
| 50 | 3 | < 0.1% | |
| 40 | 1 | < 0.1% | |
| 30 | 2 | < 0.1% | |
| 29 | 1 | < 0.1% |
X_9
Real number (ℝ≥0)
| Distinct count | 7 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.917980834528032 |
|---|---|
| Minimum | 0 |
| Maximum | 6 |
| Zeros | 200 |
| Zeros (%) | 0.5% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 5 |
| median | 5 |
| Q3 | 6 |
| 95-th percentile | 6 |
| Maximum | 6 |
| Range | 6 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.367461734 |
|---|---|
| Coefficient of variation (CV) | 0.2780534899 |
| Kurtosis | 1.252125374 |
| Mean | 4.917980835 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -1.517828575 |
| Sum | 195534 |
| Variance | 1.869951595 |
| Value | Count | Frequency (%) | |
| 5 | 17610 | 44.3% | |
| 6 | 15781 | 39.7% | |
| 2 | 5091 | 12.8% | |
| 3 | 762 | 1.9% | |
| 1 | 310 | 0.8% | |
| 0 | 200 | 0.5% | |
| 4 | 5 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 200 | 0.5% | |
| 1 | 310 | 0.8% | |
| 2 | 5091 | 12.8% | |
| 3 | 762 | 1.9% | |
| 4 | 5 | < 0.1% |
| Value | Count | Frequency (%) | |
| 6 | 15781 | 39.7% | |
| 5 | 17610 | 44.3% | |
| 4 | 5 | < 0.1% | |
| 3 | 762 | 1.9% | |
| 2 | 5091 | 12.8% |
| Distinct count | 26 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.243366281848135 |
|---|---|
| Minimum | 1 |
| Maximum | 90 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 90 |
| Range | 89 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.017419435 |
|---|---|
| Coefficient of variation (CV) | 0.8182781294 |
| Kurtosis | 2000.81086 |
| Mean | 1.243366282 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 30.92348051 |
| Sum | 49435 |
| Variance | 1.035142307 |
| Value | Count | Frequency (%) | |
| 1 | 33618 | 84.6% | |
| 2 | 4532 | 11.4% | |
| 3 | 924 | 2.3% | |
| 4 | 364 | 0.9% | |
| 5 | 114 | 0.3% | |
| 6 | 92 | 0.2% | |
| 8 | 25 | 0.1% | |
| 10 | 25 | 0.1% | |
| 7 | 23 | 0.1% | |
| 9 | 11 | < 0.1% | |
| Other values (16) | 31 | 0.1% |
| Value | Count | Frequency (%) | |
| 1 | 33618 | 84.6% | |
| 2 | 4532 | 11.4% | |
| 3 | 924 | 2.3% | |
| 4 | 364 | 0.9% | |
| 5 | 114 | 0.3% |
| Value | Count | Frequency (%) | |
| 90 | 1 | < 0.1% | |
| 58 | 1 | < 0.1% | |
| 50 | 1 | < 0.1% | |
| 40 | 2 | < 0.1% | |
| 30 | 1 | < 0.1% |
| Distinct count | 150 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 206.95434995849996 |
|---|---|
| Minimum | 0 |
| Maximum | 332 |
| Zeros | 4268 |
| Zeros (%) | 10.7% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 174 |
| median | 249 |
| Q3 | 249 |
| 95-th percentile | 316 |
| Maximum | 332 |
| Range | 332 |
| Interquartile range (IQR) | 75 |
Descriptive statistics
| Standard deviation | 93.0619573 |
|---|---|
| Coefficient of variation (CV) | 0.4496738403 |
| Kurtosis | 0.192539772 |
| Mean | 206.95435 |
| Median Absolute Deviation (MAD) | 67 |
| Skewness | -0.9031502716 |
| Sum | 8228298 |
| Variance | 8660.527897 |
| Value | Count | Frequency (%) | |
| 174 | 12100 | 30.4% | |
| 249 | 11552 | 29.1% | |
| 316 | 7577 | 19.1% | |
| 0 | 4268 | 10.7% | |
| 303 | 707 | 1.8% | |
| 127 | 519 | 1.3% | |
| 179 | 357 | 0.9% | |
| 74 | 334 | 0.8% | |
| 102 | 208 | 0.5% | |
| 263 | 176 | 0.4% | |
| Other values (140) | 1961 | 4.9% |
| Value | Count | Frequency (%) | |
| 0 | 4268 | 10.7% | |
| 1 | 3 | < 0.1% | |
| 6 | 3 | < 0.1% | |
| 11 | 7 | < 0.1% | |
| 12 | 1 | < 0.1% |
| Value | Count | Frequency (%) | |
| 332 | 4 | < 0.1% | |
| 330 | 39 | 0.1% | |
| 329 | 31 | 0.1% | |
| 328 | 120 | 0.3% | |
| 327 | 2 | < 0.1% |
| Distinct count | 24 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 309 |
| Missing (%) | 0.8% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.9733333333333334 |
|---|---|
| Minimum | 0.0 |
| Maximum | 90.0 |
| Zeros | 8517 |
| Zeros (%) | 21.4% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 90 |
| Range | 90 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 1.060944796 |
|---|---|
| Coefficient of variation (CV) | 1.090011777 |
| Kurtosis | 1710.391344 |
| Mean | 0.9733333333 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 26.54109191 |
| Sum | 38398 |
| Variance | 1.12560386 |
| Value | Count | Frequency (%) | |
| 1 | 26204 | 65.9% | |
| 0 | 8517 | 21.4% | |
| 2 | 3420 | 8.6% | |
| 3 | 797 | 2.0% | |
| 4 | 276 | 0.7% | |
| 5 | 101 | 0.3% | |
| 6 | 59 | 0.1% | |
| 8 | 18 | < 0.1% | |
| 7 | 14 | < 0.1% | |
| 10 | 11 | < 0.1% | |
| Other values (14) | 33 | 0.1% | |
| (Missing) | 309 | 0.8% |
| Value | Count | Frequency (%) | |
| 0 | 8517 | 21.4% | |
| 1 | 26204 | 65.9% | |
| 2 | 3420 | 8.6% | |
| 3 | 797 | 2.0% | |
| 4 | 276 | 0.7% |
| Value | Count | Frequency (%) | |
| 90 | 1 | < 0.1% | |
| 58 | 1 | < 0.1% | |
| 50 | 1 | < 0.1% | |
| 40 | 2 | < 0.1% | |
| 30 | 1 | < 0.1% |
X_13
Real number (ℝ≥0)
| Distinct count | 68 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 85.21886868382002 |
|---|---|
| Minimum | 0 |
| Maximum | 117 |
| Zeros | 2 |
| Zeros (%) | < 0.1% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 72 |
| median | 98 |
| Q3 | 103 |
| 95-th percentile | 112 |
| Maximum | 117 |
| Range | 117 |
| Interquartile range (IQR) | 31 |
Descriptive statistics
| Standard deviation | 27.55532481 |
|---|---|
| Coefficient of variation (CV) | 0.3233476956 |
| Kurtosis | 1.1341156 |
| Mean | 85.21886868 |
| Median Absolute Deviation (MAD) | 11 |
| Skewness | -1.398063774 |
| Sum | 3388217 |
| Variance | 759.2959255 |
| Value | Count | Frequency (%) | |
| 103 | 11775 | 29.6% | |
| 72 | 7612 | 19.1% | |
| 92 | 5353 | 13.5% | |
| 112 | 3468 | 8.7% | |
| 98 | 2307 | 5.8% | |
| 18 | 1399 | 3.5% | |
| 24 | 886 | 2.2% | |
| 109 | 848 | 2.1% | |
| 12 | 702 | 1.8% | |
| 59 | 560 | 1.4% | |
| Other values (58) | 4849 | 12.2% |
| Value | Count | Frequency (%) | |
| 0 | 2 | < 0.1% | |
| 1 | 8 | < 0.1% | |
| 2 | 382 | 1.0% | |
| 7 | 2 | < 0.1% | |
| 8 | 3 | < 0.1% |
| Value | Count | Frequency (%) | |
| 117 | 1 | < 0.1% | |
| 116 | 466 | 1.2% | |
| 115 | 31 | 0.1% | |
| 114 | 20 | 0.1% | |
| 113 | 367 | 0.9% |
| Distinct count | 69 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 72.49201438667974 |
|---|---|
| Minimum | 0 |
| Maximum | 142 |
| Zeros | 458 |
| Zeros (%) | 1.2% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 25 |
| Q1 | 29 |
| median | 62 |
| Q3 | 107 |
| 95-th percentile | 142 |
| Maximum | 142 |
| Range | 142 |
| Interquartile range (IQR) | 78 |
Descriptive statistics
| Standard deviation | 43.35376456 |
|---|---|
| Coefficient of variation (CV) | 0.5980488323 |
| Kurtosis | -1.324487842 |
| Mean | 72.49201439 |
| Median Absolute Deviation (MAD) | 33 |
| Skewness | 0.2532434153 |
| Sum | 2882210 |
| Variance | 1879.548901 |
| Value | Count | Frequency (%) | |
| 29 | 13659 | 34.4% | |
| 93 | 5140 | 12.9% | |
| 142 | 4557 | 11.5% | |
| 62 | 4070 | 10.2% | |
| 80 | 2529 | 6.4% | |
| 130 | 1976 | 5.0% | |
| 107 | 1234 | 3.1% | |
| 14 | 1158 | 2.9% | |
| 119 | 943 | 2.4% | |
| 103 | 842 | 2.1% | |
| Other values (59) | 3651 | 9.2% |
| Value | Count | Frequency (%) | |
| 0 | 458 | 1.2% | |
| 2 | 1 | < 0.1% | |
| 6 | 213 | 0.5% | |
| 10 | 1 | < 0.1% | |
| 12 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 142 | 4557 | 11.5% | |
| 140 | 108 | 0.3% | |
| 139 | 13 | < 0.1% | |
| 138 | 227 | 0.6% | |
| 136 | 101 | 0.3% |
| Distinct count | 36 |
|---|---|
| Unique (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33.44789858899872 |
|---|---|
| Minimum | 0 |
| Maximum | 50 |
| Zeros | 1680 |
| Zeros (%) | 4.2% |
| Memory size | 310.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 23 |
| Q1 | 34 |
| median | 34 |
| Q3 | 34 |
| 95-th percentile | 46 |
| Maximum | 50 |
| Range | 50 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 8.357811091 |
|---|---|
| Coefficient of variation (CV) | 0.2498755211 |
| Kurtosis | 8.811395375 |
| Mean | 33.44789859 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -2.54436585 |
| Sum | 1329855 |
| Variance | 69.85300624 |
| Value | Count | Frequency (%) | |
| 34 | 31646 | 79.6% | |
| 43 | 2504 | 6.3% | |
| 0 | 1680 | 4.2% | |
| 46 | 1079 | 2.7% | |
| 23 | 1063 | 2.7% | |
| 48 | 864 | 2.2% | |
| 36 | 307 | 0.8% | |
| 50 | 217 | 0.5% | |
| 9 | 170 | 0.4% | |
| 39 | 82 | 0.2% | |
| Other values (26) | 147 | 0.4% |
| Value | Count | Frequency (%) | |
| 0 | 1680 | 4.2% | |
| 1 | 1 | < 0.1% | |
| 2 | 1 | < 0.1% | |
| 3 | 1 | < 0.1% | |
| 4 | 4 | < 0.1% |
| Value | Count | Frequency (%) | |
| 50 | 217 | 0.5% | |
| 48 | 864 | 2.2% | |
| 47 | 1 | < 0.1% | |
| 46 | 1079 | 2.7% | |
| 43 | 2504 | 6.3% |
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 15903 |
| Missing (%) | 40.0% |
| Memory size | 310.6 KiB |
| 1 | |
|---|---|
| 0 | 1068 |
| (Missing) |
| Value | Count | Frequency (%) | |
| 1 | 22788 | 57.3% | |
| 0 | 1068 | 2.7% | |
| (Missing) | 15903 | 40.0% |
is_test_data
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 310.6 KiB |
| 0 | |
|---|---|
| 1 |
| Value | Count | Frequency (%) | |
| 0 | 23856 | 60.0% | |
| 1 | 15903 | 40.0% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| df_index | INCIDENT_ID | DATE | X_1 | X_2 | X_3 | X_4 | X_5 | X_6 | X_7 | X_8 | X_9 | X_10 | X_11 | X_12 | X_13 | X_14 | X_15 | MULTIPLE_OFFENSE | is_test_data | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | CR_102659 | 04-JUL-04 | 0 | 36 | 34 | 2 | 1 | 5 | 6 | 1 | 6 | 1 | 174 | 1.0 | 92 | 29 | 36 | 0.0 | 0 |
| 1 | 1 | CR_189752 | 18-JUL-17 | 1 | 37 | 37 | 0 | 0 | 11 | 17 | 1 | 6 | 1 | 236 | 1.0 | 103 | 142 | 34 | 1.0 | 0 |
| 2 | 2 | CR_184637 | 15-MAR-17 | 0 | 3 | 2 | 3 | 5 | 1 | 0 | 2 | 3 | 1 | 174 | 1.0 | 110 | 93 | 34 | 1.0 | 0 |
| 3 | 3 | CR_139071 | 13-FEB-09 | 0 | 33 | 32 | 2 | 1 | 7 | 1 | 1 | 6 | 1 | 249 | 1.0 | 72 | 29 | 34 | 1.0 | 0 |
| 4 | 4 | CR_109335 | 13-APR-05 | 0 | 33 | 32 | 2 | 1 | 8 | 3 | 0 | 5 | 1 | 174 | 0.0 | 112 | 29 | 43 | 1.0 | 0 |
| 5 | 5 | CR_96263 | 07-APR-03 | 0 | 45 | 45 | 10 | 3 | 1 | 0 | 1 | 6 | 1 | 303 | 1.0 | 72 | 62 | 34 | 1.0 | 0 |
| 6 | 6 | CR_131400 | 22-JAN-08 | 0 | 30 | 35 | 7 | 3 | 7 | 1 | 0 | 5 | 1 | 174 | 0.0 | 112 | 29 | 43 | 1.0 | 0 |
| 7 | 7 | CR_11981 | 14-MAY-93 | 0 | 8 | 7 | 7 | 3 | 9 | 8 | 0 | 5 | 1 | 316 | 1.0 | 72 | 62 | 34 | 1.0 | 0 |
| 8 | 8 | CR_184134 | 21-AUG-16 | 0 | 49 | 49 | 6 | 5 | 8 | 3 | 1 | 1 | 1 | 316 | 1.0 | 103 | 14 | 34 | 1.0 | 0 |
| 9 | 9 | CR_32634 | 25-AUG-96 | 1 | 4 | 4 | 6 | 5 | 15 | 10 | 0 | 5 | 2 | 145 | 1.0 | 103 | 29 | 34 | 0.0 | 0 |
Last rows
| df_index | INCIDENT_ID | DATE | X_1 | X_2 | X_3 | X_4 | X_5 | X_6 | X_7 | X_8 | X_9 | X_10 | X_11 | X_12 | X_13 | X_14 | X_15 | MULTIPLE_OFFENSE | is_test_data | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 39749 | 15893 | CR_148375 | 22-OCT-11 | 0 | 3 | 2 | 3 | 5 | 1 | 0 | 0 | 5 | 2 | 249 | 2.0 | 103 | 80 | 34 | NaN | 1 |
| 39750 | 15894 | CR_67736 | 29-JUN-00 | 0 | 21 | 23 | 4 | 1 | 6 | 4 | 0 | 5 | 1 | 174 | 1.0 | 98 | 93 | 34 | NaN | 1 |
| 39751 | 15895 | CR_185890 | 29-JUN-17 | 0 | 5 | 5 | 3 | 5 | 8 | 3 | 1 | 6 | 1 | 249 | 1.0 | 72 | 29 | 34 | NaN | 1 |
| 39752 | 15896 | CR_89868 | 11-MAY-03 | 0 | 3 | 2 | 3 | 5 | 1 | 0 | 1 | 6 | 1 | 0 | 1.0 | 72 | 29 | 34 | NaN | 1 |
| 39753 | 15897 | CR_148343 | 01-SEP-11 | 0 | 3 | 2 | 3 | 5 | 1 | 0 | 3 | 6 | 1 | 303 | 1.0 | 72 | 29 | 34 | NaN | 1 |
| 39754 | 15898 | CR_44468 | 28-NOV-97 | 1 | 22 | 22 | 7 | 3 | 15 | 10 | 0 | 5 | 1 | 174 | 0.0 | 72 | 29 | 43 | NaN | 1 |
| 39755 | 15899 | CR_158460 | 09-JUN-12 | 0 | 35 | 30 | 3 | 5 | 1 | 0 | 2 | 3 | 2 | 0 | 2.0 | 72 | 93 | 34 | NaN | 1 |
| 39756 | 15900 | CR_115946 | 22-APR-06 | 0 | 26 | 27 | 9 | 0 | 6 | 4 | 2 | 6 | 1 | 0 | 1.0 | 72 | 62 | 34 | NaN | 1 |
| 39757 | 15901 | CR_137663 | 03-APR-09 | 0 | 21 | 23 | 4 | 1 | 2 | 7 | 1 | 6 | 2 | 249 | 2.0 | 92 | 62 | 34 | NaN | 1 |
| 39758 | 15902 | CR_33545 | 24-APR-96 | 0 | 4 | 4 | 6 | 5 | 4 | 2 | 5 | 6 | 1 | 249 | 1.0 | 72 | 29 | 34 | NaN | 1 |